1 Introduction

Chromophobe renall cell carcinoma (KICH), is a rare form of kidney cancer affecting about 5% of all cancers arising from the kidney nephron. The Cancer Genome Atlas (TCGA) has comprehensively profiled this type of cancer in a patient cohort. Here we analyze the expression profiles of those patients, accessible in the form of a raw RNA-seq counts produced by Rahman et al. (2015) using a pipeline based on the R/Bioconductor software package Rsubread.

This document is written in R markdown and should be processed using R and you need to install the packages knitr and markdown. Moreover, it using the official style for Bioconductor vignettes facilitated by the Bioconductor package BiocStyle. Please consult that package documentation, and particularly the vignette on “Authoring R Markdown vignettes”, for full details on how to elaborate this kind of documents.

The specific instructions to generate the final HTML report are written in a Makefile. To run it you just need to type

$ make

on the unix shell. The makefile contains instructions to separately process the different files that integrate this report, and so only the file that has been modified will be rebuilt. If you wish to remove or add files to be processed, you should modify the makefile.

The directory results will contain resulting files produced during the analysis, however, figures are going to end by default directories associated with the source filenames from where they were created.

2 Quality assessment

2.1 Data import

We start importing the raw table of counts.

library(SummarizedExperiment)

se <- readRDS(file.path("rawCounts", "seKICH.rds"))
se
class: RangedSummarizedExperiment 
dim: 20115 91 
metadata(5): experimentData annotation cancerTypeCode
  cancerTypeDescription objectCreationDate
assays(1): counts
rownames(20115): 1 2 ... 102724473 103091865
rowData names(3): symbol txlen txgc
colnames(91): TCGA.KL.8323.01A.21R.2315.07
  TCGA.KL.8324.01A.11R.2315.07 ... TCGA.KO.8403.11A.01R.2315.07
  TCGA.KO.8415.11A.01R.2315.07
colData names(549): type bcr_patient_uuid ...
  lymph_nodes_aortic_pos_by_ihc lymph_nodes_aortic_pos_total

Explore the column (phenotypic) data, which in this case corresponds to clinical variables, and their corresponding metadata.

dim(colData(se))
[1]  91 549
colData(se)[1:5, 1:5]
DataFrame with 5 rows and 5 columns
                                 type                     bcr_patient_uuid
                             <factor>                             <factor>
TCGA.KL.8323.01A.21R.2315.07    tumor 83b3060c-2449-4581-88ef-817d126e4525
TCGA.KL.8324.01A.11R.2315.07    tumor 0697436f-e487-45db-b0bc-ad9246f70196
TCGA.KL.8325.01A.11R.2315.07    tumor 6101ffe6-2ef6-4256-9d6b-a7c545836995
TCGA.KL.8326.01A.11R.2315.07    tumor 03f3dd32-e1d1-485b-a968-3f55798f4d46
TCGA.KL.8327.01A.11R.2315.07    tumor 02979422-5149-4750-ad5f-483e0bec6ac5
                             bcr_patient_barcode form_completion_date
                                        <factor>             <factor>
TCGA.KL.8323.01A.21R.2315.07        TCGA-KL-8323            2012-7-24
TCGA.KL.8324.01A.11R.2315.07        TCGA-KL-8324            2012-7-24
TCGA.KL.8325.01A.11R.2315.07        TCGA-KL-8325            2012-7-24
TCGA.KL.8326.01A.11R.2315.07        TCGA-KL-8326            2012-7-24
TCGA.KL.8327.01A.11R.2315.07        TCGA-KL-8327            2012-7-24
                             prospective_collection
                                           <factor>
TCGA.KL.8323.01A.21R.2315.07                     NO
TCGA.KL.8324.01A.11R.2315.07                     NO
TCGA.KL.8325.01A.11R.2315.07                     NO
TCGA.KL.8326.01A.11R.2315.07                     NO
TCGA.KL.8327.01A.11R.2315.07                     NO
mcols(colData(se), use.names=TRUE)
DataFrame with 549 rows and 2 columns
                                                         labelDescription
                                                              <character>
type                                           sample type (tumor/normal)
bcr_patient_uuid                                         bcr patient uuid
bcr_patient_barcode                                   bcr patient barcode
form_completion_date                                 form completion date
prospective_collection            tissue prospective collection indicator
...                                                                   ...
lymph_nodes_pelvic_pos_total                               total pelv lnp
lymph_nodes_aortic_examined_count                           total aor lnr
lymph_nodes_aortic_pos_by_he                          aln pos light micro
lymph_nodes_aortic_pos_by_ihc                                 aln pos ihc
lymph_nodes_aortic_pos_total                                total aor-lnp
                                        CDEID
                                  <character>
type                                       NA
bcr_patient_uuid                           NA
bcr_patient_barcode                   2673794
form_completion_date                       NA
prospective_collection                3088492
...                                       ...
lymph_nodes_pelvic_pos_total          3151828
lymph_nodes_aortic_examined_count     3104460
lymph_nodes_aortic_pos_by_he          3151832
lymph_nodes_aortic_pos_by_ihc         3151831
lymph_nodes_aortic_pos_total          3151827

These metadata consists of two columns of information about the clinical variables. One called labelDescription contains a succint description of the variable, often not more self-explanatory than the variable name itself, and the other called ‘CDEID’ corresponds to the so-called Common Data Element (CDE) identifier. This identifier can be use in https://cdebrowser.nci.nih.gov to search for further information about the associated clinical variable using the Advanced search form and the Public ID attribute search.

Now, explore the row (feature) data.

rowData(se)
DataFrame with 20115 rows and 3 columns
           symbol     txlen      txgc
      <character> <integer> <numeric>
1            A1BG      3322 0.5644190
2             A2M      4844 0.4882329
3            NAT1      2280 0.3942982
4            NAT2      1322 0.3895613
5        SERPINA3      3067 0.5249429
...           ...       ...       ...
20111       POTEB      1706 0.4308324
20112    SNORD124       104 0.4903846
20113   SNORD121B        81 0.4074074
20114      GAGE10       538 0.5055762
20115   BRWD1-IT2      1028 0.5924125
rowRanges(se)
GRanges object with 20115 ranges and 3 metadata columns:
            seqnames               ranges strand |      symbol     txlen
               <Rle>            <IRanges>  <Rle> | <character> <integer>
          1    chr19 [58345178, 58362751]      - |        A1BG      3322
          2    chr12 [ 9067664,  9116229]      - |         A2M      4844
          9     chr8 [18170477, 18223689]      + |        NAT1      2280
         10     chr8 [18391245, 18401218]      + |        NAT2      1322
         12    chr14 [94592058, 94624646]      + |    SERPINA3      3067
        ...      ...                  ...    ... .         ...       ...
  100996331    chr15 [20835372, 21877298]      - |       POTEB      1706
  101340251    chr17 [40027542, 40027645]      - |    SNORD124       104
  101340252     chr9 [33934296, 33934376]      - |   SNORD121B        81
  102724473     chrX [49303669, 49319844]      + |      GAGE10       538
  103091865    chr21 [39313935, 39314962]      + |   BRWD1-IT2      1028
                 txgc
            <numeric>
          1 0.5644190
          2 0.4882329
          9 0.3942982
         10 0.3895613
         12 0.5249429
        ...       ...
  100996331 0.4308324
  101340251 0.4903846
  101340252 0.4074074
  102724473 0.5055762
  103091865 0.5924125
  -------
  seqinfo: 455 sequences (1 circular) from hg38 genome

To perform quality assessment and normalization we need first to load the edgeR R/Bioconductor package and create a `DGEList’ object.

library(edgeR)

dge <- DGEList(counts=assays(se)$counts, genes=mcols(se))
Warning in as.data.frame(x, row.names = NULL, optional = optional, ...):
Arguments in '...' ignored
saveRDS(dge, file.path("results", "dge.rds"))

Now calculate \(\log_2\) CPM values of expression and put them as an additional assay element to ease their manipulation.

assays(se)$logCPM <- cpm(dge, log=TRUE, prior.count=0.5)
assays(se)$logCPM[1:5, 1:5]
   TCGA.KL.8323.01A.21R.2315.07 TCGA.KL.8324.01A.11R.2315.07
1                     -1.176627                   -0.7690097
2                      8.735598                    8.5877309
9                     -6.527378                   -6.5273783
10                    -6.527378                   -6.5273783
12                     5.533634                    6.4826026
   TCGA.KL.8325.01A.11R.2315.07 TCGA.KL.8326.01A.11R.2315.07
1                     -1.143140                     3.174466
2                      8.255307                     8.171660
9                     -6.527378                    -6.527378
10                    -6.527378                    -6.527378
12                     6.125902                     5.973882
   TCGA.KL.8327.01A.11R.2315.07
1                      1.248823
2                      8.477338
9                     -6.527378
10                    -6.527378
12                     7.450862

2.2 Sequencing depth

Let’s examine the sequencing depth in terms of total number of sequence read counts mapped to the genome per sample. Figure 1 below shows the sequencing depth per sample, also known as library sizes, in increasing order.

Library sizes in increasing order.

Figure 1: Library sizes in increasing order

This figure reveals substantial differences in sequencing depth between samples and we may consider discarding those samples whose depth is substantially lower than the rest. To identify who are these samples we may simply look at the actual numbers including portion of the sample identifier that distinguishes them.

sampledepth <- round(dge$sample$lib.size / 1e6, digits=1)
names(sampledepth) <- substr(colnames(se), 6, 12)
sort(sampledepth)
KL.8327 KN.8431 KM.8476 KO.8409 KN.8433 KN.8432 KL.8332 KO.8408 KO.8415 
   18.2    20.7    27.3    27.5    28.8    31.3    33.5    34.6    35.1 
KL.8326 KO.8416 KM.8438 KL.8324 KO.8404 KM.8439 KO.8415 KN.8419 KO.8405 
   35.5    35.6    36.4    36.5    36.5    37.4    37.7    38.1    39.3 
KL.8342 KL.8338 KM.8477 KL.8328 KN.8422 KO.8403 KM.8443 KO.8407 KL.8340 
   39.8    40.0    40.4    41.1    41.1    41.3    41.8    42.4    42.6 
KN.8432 KL.8346 KO.8413 KM.8441 KN.8430 KL.8331 KL.8323 KL.8326 KN.8426 
   42.7    42.8    42.9    43.1    43.5    43.8    44.0    44.1    44.3 
KN.8421 KL.8337 KN.8436 KL.8345 KN.8434 KN.8437 KL.8335 KM.8440 KN.8424 
   44.4    44.8    44.9    45.0    45.0    45.1    45.4    46.2    46.6 
KN.8433 KN.8419 KN.8428 KO.8411 KN.8429 KO.8414 KL.8336 KO.8417 KN.8436 
   46.7    47.1    47.1    47.2    48.1    48.3    48.4    48.4    48.5 
KL.8339 KM.8442 KN.8437 KL.8343 KN.8435 KL.8333 KL.8324 KN.8425 KO.8406 
   48.8    49.4    49.4    49.5    49.8    50.2    50.3    50.6    50.8 
KO.8410 KN.8430 KL.8325 KN.8431 KL.8334 KN.8418 KL.8330 KN.8424 KN.8429 
   50.9    51.9    52.0    52.6    52.8    53.1    53.3    53.3    53.5 
KN.8434 KN.8422 KN.8423 KL.8339 KN.8428 KL.8336 KO.8403 KN.8427 KL.8329 
   53.5    53.6    53.9    54.4    54.9    55.8    55.9    56.0    56.2 
KL.8329 KL.8332 KN.8426 KM.8639 KN.8423 KL.8344 KN.8435 KN.8425 KL.8341 
   57.6    57.7    58.5    58.9    58.9    59.2    59.3    59.7    60.1 
KN.8427 
   62.2 

2.3 Distribution of expression levels among samples

Let’s look at the distribution of expression values per sample in terms of logarithmic CPM units. Due to the large number of samples, we display tumor and normal samples separately, and are shown in Figure 2

Non-parametric density distribution of expression profiles per sample.

Figure 2: Non-parametric density distribution of expression profiles per sample

We do not appreciate substantial differences between the samples in the distribution of expression values.

2.4 Distribution of expression levels among genes

Let’s calculate now the average expression per gene through all the samples. Figure 3 shows the distribution of those values across genes.

Distribution of average expression level per gene.

Figure 3: Distribution of average expression level per gene

2.5 Filtering of lowly-expressed genes

In the light of this plot, we may consider a cutoff of 1 log CPM unit as minimum value of expression to select genes being expressed across samples. Using this cutoff we proceed to filter out lowly-expressed genes.

mask <- avgexp > 1
dim(se)
[1] 20115    91
se.filt <- se[mask, ]
dim(se.filt)
[1] 11368    91
dge.filt <- dge[mask, ]
dim(dge.filt)
[1] 11368    91

Store un-normalized versions of the filtered expression data.

saveRDS(se.filt, file.path("results", "se.filt.unnorm.rds"))
saveRDS(dge.filt, file.path("results", "dge.filt.unnorm.rds"))

2.6 Normalization

We calculate now the normalization factors on the filtered expression data set.

dge.filt <- calcNormFactors(dge.filt)

Replace the raw log2 CPM units in the corresponding assay element of the SummarizedExperiment object, by the normalized ones.

assays(se.filt)$logCPM <- cpm(dge.filt, log=TRUE, normalized.lib.sizes=TRUE, prior.count=0.25)

Store normalized versions of the filtered expression data.

saveRDS(se.filt, file.path("results", "se.filt.rds"))
saveRDS(dge.filt, file.path("results", "dge.filt.rds"))

2.7 MA-plots

We examine now the MA-plots of the normalized expression profiles. We look first to the tumor samples in Figure 4.

MA-plots of the tumor samples.

Figure 4: MA-plots of the tumor samples

We do not observe samples with major expression-level dependent biases. Let’s look now to the normal samples in Figure 5.

MA-plots of the normal samples.

Figure 5: MA-plots of the normal samples

We do not observe either important expression-level dependent biases among the normal samples.

2.8 Batch identification

We will search now for potential surrogate of batch effect indicators. Given that each sample names corresponds to a TCGA barcode (see https://wiki.nci.nih.gov/display/TCGA/TCGA+barcode), following the strategy described in http://bioinformatics.mdanderson.org/main/TCGABatchEffects:Overview we are going to derive different elements of the TCGA barcode and examine their distribution across samples.

tss <- substr(colnames(se.filt), 6, 7)
table(tss)
tss
KL KM KN KO 
30  9 36 16 
center <- substr(colnames(se.filt), 27, 28)
table(center)
center
07 
91 
plate <- substr(colnames(se.filt), 22, 25)
table(plate)
plate
2315 2403 
  90    1 
portionanalyte <- substr(colnames(se.filt), 18, 20)
table(portionanalyte)
portionanalyte
01R 11R 21R 
 25  65   1 
samplevial <- substr(colnames(se.filt), 14, 16)
table(samplevial)
samplevial
01A 11A 
 66  25 

From this information we can make the following observations:

  • All samples were sequenced at the same center

  • All samples belong to one of two combinations of tissue type and vial, matching the expected tumor and normal distribution.

  • Samples were collected across different tissue source sites (TSS).

  • All samples were sequenced within the same plate, except for the following one:

colnames(se.filt)[plate == "2403"]
[1] "TCGA.KM.8639.01A.11R.2403.07"
  • All samples were sequenced using one of two portion and analyte combinations except fo the following one:
colnames(se.filt)[portionanalyte == "21R"]
[1] "TCGA.KL.8323.01A.21R.2315.07"

We are going to use the TSS as surrogate of batch effect indicator. Considering our outcome of interest as molecular changes between sample types, tumor vs. normal, we will examine now the cross-classification of this outcome with TSS.

table(data.frame(TYPE=se.filt$type, TSS=tss))
        TSS
TYPE     KL KM KN KO
  normal  6  0 17  2
  tumor  24  9 19 14

Observe that normal tissues with TSS=KM or TSS=KO are under-represented with respect to the tumor tissues. If TSS is a source of expression variability, this under-representation of those two TSS in the normal samples may lead to a potential confounding effect.

We examine now how samples group together by hierarchical clustering and multidimensional scaling, annotating the outcome of interest and the the surrogate of batch indicator. We calculate again log CPM values with a higher prior count to moderate extreme fold-changes produced by low counts. The resulting dendrogram is shown in Figure 6.

logCPM <- cpm(dge.filt, log=TRUE, prior.count=3)
d <- as.dist(1-cor(logCPM, method="spearman"))
sampleClustering <- hclust(d)
batch <- as.integer(factor(tss))
sampleDendrogram <- as.dendrogram(sampleClustering, hang=0.1)
names(batch) <- colnames(se.filt)
outcome <- paste(substr(colnames(se.filt), 9, 12), as.character(se.filt$type), sep="-")
names(outcome) <- colnames(se.filt)
sampleDendrogram <- dendrapply(sampleDendrogram,
                               function(x, batch, labels) {
                                 if (is.leaf(x)) {
                                   attr(x, "nodePar") <- list(lab.col=as.vector(batch[attr(x, "label")]))
                                   attr(x, "label") <- as.vector(labels[attr(x, "label")])
                                 }
                                 x
                               }, batch, outcome)
plot(sampleDendrogram, main="Hierarchical clustering of samples")
legend("topright", paste("Batch", sort(unique(batch)), levels(factor(tss))), fill=sort(unique(batch)))
Figure S6: Hierarchical clustering of the samples.

Figure 6: Figure S6: Hierarchical clustering of the samples

We can observe that samples cluster primarily by sample type, tumor or normal. TSS seems to have a stronger effect among the normal samples, while it distributes better among the tumor samples. We may consider discarding samples leading to an unbalanced distribution of the outcome across batches.

In Figure 7 we show the corresponding MDS plot. Here we see more clearly that the first source of variation separates tumor from normal samples. We can also observe that two tumor samples, corresponding to individuals KL-8404 and KN-8427 are separated from the rest, just as it happens in the hierchical clustering. A closer examination of their corresponding MA-plots also reveals a slight dependence of expression changes on average expression. We may consider discarding these two samples and doing the MDS plot again to have a closer look to the differences among the rest of the samples and their relationship with TSS.

plotMDS(dge.filt, labels=outcome, col=batch)
legend("bottomleft", paste("Batch", sort(unique(batch)), levels(factor(tss))),
       fill=sort(unique(batch)), inset=0.05)
Figure S7: Multidimensional scaling plot of the samples.

Figure 7: Figure S7: Multidimensional scaling plot of the samples

2.9 Session information

sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/R/R-3.4.2/lib64/R/lib/libRblas.so
LAPACK: /opt/R/R-3.4.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.UTF8        LC_COLLATE=en_US.UTF8    
 [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8   
 [7] LC_PAPER=en_US.UTF8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats4    parallel  methods   stats     graphics  grDevices utils    
[8] datasets  base     

other attached packages:
 [1] geneplotter_1.56.0         annotate_1.56.1           
 [3] XML_3.98-1.10              AnnotationDbi_1.40.0      
 [5] lattice_0.20-35            edgeR_3.20.8              
 [7] limma_3.34.9               SummarizedExperiment_1.8.1
 [9] DelayedArray_0.4.1         matrixStats_0.53.1        
[11] Biobase_2.38.0             GenomicRanges_1.30.2      
[13] GenomeInfoDb_1.14.0        IRanges_2.12.0            
[15] S4Vectors_0.16.0           BiocGenerics_0.24.0       
[17] knitr_1.20                 BiocStyle_2.6.1           

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15           highr_0.6              RColorBrewer_1.1-2    
 [4] pillar_1.1.0           compiler_3.4.2         XVector_0.18.0        
 [7] bitops_1.0-6           tools_3.4.2            zlibbioc_1.24.0       
[10] bit_1.1-12             digest_0.6.15          memoise_1.1.0         
[13] tibble_1.4.2           evaluate_0.10.1        RSQLite_2.0           
[16] rlang_0.2.0            Matrix_1.2-12          DBI_0.7               
[19] yaml_2.1.18            xfun_0.1               GenomeInfoDbData_1.0.0
[22] stringr_1.3.0          locfit_1.5-9.1         rprojroot_1.3-2       
[25] bit64_0.9-7            grid_3.4.2             rmarkdown_1.8         
[28] bookdown_0.7           blob_1.1.0             magrittr_1.5          
[31] codetools_0.2-15       backports_1.1.2        htmltools_0.3.6       
[34] xtable_1.8-2           KernSmooth_2.23-15     stringi_1.1.6         
[37] RCurl_1.95-4.10       

3 Differential expression

We perform a simple examination of expression changes and their associated p-values using the R/Bioconductor package sva.

library(sva)
mod <- model.matrix(~ se.filt$type, colData(se.filt))
mod0 <- model.matrix(~ 1, colData(se.filt))
pv <- f.pvalue(assays(se.filt)$logCPM, mod, mod0)
sum(p.adjust(pv, method="fdr") < 0.01)
[1] 7000

There are 7000 genes changing significantly their expression at FDR < 1%. In Figure 8 below we show the distribution of the resulting p-values.

Distribution of raw p-values for an F-test on every gene between tumor and normal samples.

Figure 8: Distribution of raw p-values for an F-test on every gene between tumor and normal samples

Now, let’s estimate surrogate variables using the sva() function.

sv <- sva(assays(se.filt)$logCPM, mod, mod0)
Number of significant surrogate variables is:  18 
Iteration (out of 5 ):1  2  3  4  5  
sv$n
[1] 18

The SVA algorithm has found 18 surrogate variables. Let’s use them to assess againt the extent of differential expression this time adjusting for these surrogate variables.

modsv <- cbind(mod, sv$sv)
mod0sv <- cbind(mod0, sv$sv)
pvsv <- f.pvalue(assays(se.filt)$logCPM, modsv, mod0sv)
sum(p.adjust(pvsv, method="fdr") < 0.01)
[1] 8472

We have increased the number of changing genes to 8472. Figure 9 shows the resulting distribution of p-values.

Distribution of raw p-values for an F-test on every gene between tumor and normal samples, adjusting for surrogate variables estimated with SVA.

Figure 9: Distribution of raw p-values for an F-test on every gene between tumor and normal samples, adjusting for surrogate variables estimated with SVA

3.1 Session information

sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/R/R-3.4.2/lib64/R/lib/libRblas.so
LAPACK: /opt/R/R-3.4.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.UTF8        LC_COLLATE=en_US.UTF8    
 [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8   
 [7] LC_PAPER=en_US.UTF8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C      

attached base packages:
[1] parallel  stats4    methods   stats     graphics  grDevices utils    
[8] datasets  base     

other attached packages:
 [1] sva_3.26.0                 BiocParallel_1.12.0       
 [3] genefilter_1.60.0          mgcv_1.8-23               
 [5] nlme_3.1-131.1             geneplotter_1.56.0        
 [7] annotate_1.56.1            XML_3.98-1.10             
 [9] AnnotationDbi_1.40.0       lattice_0.20-35           
[11] edgeR_3.20.8               limma_3.34.9              
[13] SummarizedExperiment_1.8.1 DelayedArray_0.4.1        
[15] matrixStats_0.53.1         Biobase_2.38.0            
[17] GenomicRanges_1.30.2       GenomeInfoDb_1.14.0       
[19] IRanges_2.12.0             S4Vectors_0.16.0          
[21] BiocGenerics_0.24.0        knitr_1.20                
[23] BiocStyle_2.6.1           

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1         xfun_0.1               splines_3.4.2         
 [4] htmltools_0.3.6        yaml_2.1.18            blob_1.1.0            
 [7] survival_2.41-3        rlang_0.2.0            pillar_1.1.0          
[10] DBI_0.7                bit64_0.9-7            RColorBrewer_1.1-2    
[13] GenomeInfoDbData_1.0.0 stringr_1.3.0          zlibbioc_1.24.0       
[16] codetools_0.2-15       evaluate_0.10.1        memoise_1.1.0         
[19] highr_0.6              Rcpp_0.12.15           xtable_1.8-2          
[22] backports_1.1.2        XVector_0.18.0         bit_1.1-12            
[25] digest_0.6.15          stringi_1.1.6          bookdown_0.7          
[28] grid_3.4.2             rprojroot_1.3-2        tools_3.4.2           
[31] bitops_1.0-6           magrittr_1.5           RCurl_1.95-4.10       
[34] RSQLite_2.0            tibble_1.4.2           Matrix_1.2-12         
[37] rmarkdown_1.8          compiler_3.4.2        

4 Functional analysis

Here we do the functional analysis.

4.1 Session information

sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/R/R-3.4.2/lib64/R/lib/libRblas.so
LAPACK: /opt/R/R-3.4.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.UTF8        LC_COLLATE=en_US.UTF8    
 [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8   
 [7] LC_PAPER=en_US.UTF8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C      

attached base packages:
[1] parallel  stats4    methods   stats     graphics  grDevices utils    
[8] datasets  base     

other attached packages:
 [1] geneplotter_1.56.0         annotate_1.56.1           
 [3] XML_3.98-1.10              AnnotationDbi_1.40.0      
 [5] lattice_0.20-35            edgeR_3.20.8              
 [7] limma_3.34.9               SummarizedExperiment_1.8.1
 [9] DelayedArray_0.4.1         matrixStats_0.53.1        
[11] Biobase_2.38.0             GenomicRanges_1.30.2      
[13] GenomeInfoDb_1.14.0        IRanges_2.12.0            
[15] S4Vectors_0.16.0           BiocGenerics_0.24.0       
[17] knitr_1.20                 BiocStyle_2.6.1           

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15           RColorBrewer_1.1-2     pillar_1.1.0          
 [4] compiler_3.4.2         XVector_0.18.0         bitops_1.0-6          
 [7] tools_3.4.2            zlibbioc_1.24.0        bit_1.1-12            
[10] digest_0.6.15          memoise_1.1.0          tibble_1.4.2          
[13] evaluate_0.10.1        RSQLite_2.0            rlang_0.2.0           
[16] Matrix_1.2-12          DBI_0.7                yaml_2.1.18           
[19] xfun_0.1               GenomeInfoDbData_1.0.0 stringr_1.3.0         
[22] locfit_1.5-9.1         rprojroot_1.3-2        bit64_0.9-7           
[25] grid_3.4.2             rmarkdown_1.8          bookdown_0.7          
[28] blob_1.1.0             magrittr_1.5           backports_1.1.2       
[31] htmltools_0.3.6        xtable_1.8-2           stringi_1.1.6         
[34] RCurl_1.95-4.10       

References

Rahman, Mumtahena, Laurie K Jackson, W Evan Johnson, Dean Y Li, Andrea H Bild, and Stephen R Piccolo. 2015. “Alternative Preprocessing of Rna-Sequencing Data in the Cancer Genome Atlas Leads to Improved Analysis Results.” Bioinformatics 31 (22). Oxford University Press: 3666–72.